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Abstract: In recent years, multiple hypothesis testing has come to the fore- 
front of statistical research, ostensibly in relation to applications in genomics 
and some other emerging fields. The false discovery rate (FDR) and its vari- 
ants provide very important notions of errors in this context comparable to the 
role of error probabilities in classical testing problems. Accurate estimation of 
positive FDR (pFDR), a variant of the FDR, is essential in assessing and con- 
trolling this measure. In a recent paper, the authors proposed a model-based 
nonparametric Bayesian method of estimation of the pFDR function. In par- 
ticular, the density of p-values was modeled as a mixture of decreasing beta 
densities and an appropriate Dirichlet process was considered as a prior on the 
mixing measure. The resulting procedure was shown to work well in simula- 
tions. In this paper, we provide some theoretical results in support of the beta 
mixture model for the density of p-values, and show that, under appropriate 
conditions, the resulting posterior is consistent as the number of hypotheses 
grows to infinity. 



1. Introduction 

Consider the problem of testing m null hypotheses H ^, . . . , H . m simultaneously, 
where m is a large number. This type of multiple hypothesis testing problem has 
received a lot of attention in recent years, primarily due to advanced data col- 
lection techniques in genomics, microarray analysis, proteomics, fMRI and some 
other fields. The analog of type I error probability in multiple testing problems 
is given by the family-wise error rate, which is defined as the probability of mak- 
ing at least one false rejection. Such a measure is too stringent when m is even 
moderately large and will block many genuine discoveries (i.e., rejection of a false 
null hypothesis). In a pioneering paper, Bcnjamini and Hochberg [2] introduced 
the concept of the false discovery rate (FDR), the expected value of the ratio of 
the number of false rejections to the total number of rejections, and described a 
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procedure to control it. Mathematically, the FDR at a nominal level 7 is given by 
E(V/ max(i?, 1)) = E(V/R\R > 0)P(R > 0), where R = R(j) stands for the num- 
ber of hypotheses rejected at nominal level 7 and V = V(7) is the number of false 
rejections among these. Storey [11, 12] argued that the positive false discovery rate 
(pFDR) (at nominal level 7) defined as E(V/R\R > 0), is a more relevant measure 
to control. Storey's approach consists of estimating the pFDR function at each 7 
and choosing a 7 so that the estimated pFDR function is within a given limit, a. 
Storey [11, 12] showed that under a certain natural setup, the resulting procedure 
controls pFDR by a. Some other related measures have also been considered in the 
literature; see Benjamini and Hochbcrg [2], Efron and Tibshirani [3], Tsai et al. [14] 
and Sarkar [10]. 

In order to estimate the pFDR function, Storey [11] considered a mixture model 
setup, where each null hypothesis has a fixed probability, tt, of being true. Thus, the 
number of true null hypotheses, mo, is taken to be a random variable distributed 
as binomial (m, tt). If the null hypothesis is true, then it is assumed that the p- 
value associated with the corresponding test statistic is uniformly distributed. The 
p-value when the alternative is true and has a fixed value 9, follows a distribu- 
tion H = H(-\8). It is somewhat unnatural to assume that the alternative value 
remains fixed when the hypotheses themselves are appearing randomly. A more 
natural assumption would be to assume that, given that null hypothesis is false, 
the alternative is chosen randomly according to a distribution fi. Then, marginally, 
the conclusion that the p-value under the alternative is distributed as H remains 
unaffected, where now H stands for the mixture J H (-\8)dfi(8) . Under this setup, 
Storey [11] showed that the pFDR at nominal level 7 is given by the expression 
7r 7/[ 7r 7+ (1 — 7r )^(7)]- To estimate the pFDR, it then suffices to estimate tt, since 
the denominator can be estimated essentially by the empirical proportion R/m. 
Actually, Storey [11] used a slightly different estimator to take into account the 
problem of zeros in finite samples. Estimation of tt is more delicate. Storey [11] 
assumed that for some appropriate threshold value A, all p-values over A are as- 
sociated with true null hypotheses. Equating the observed proportion of rejected 
hypotheses with the expectation A(l— tt), and choosing A appropriately, an estimate 
of tt, and hence that of pFDR, can be obtained. 

Although Storey [11] did not make any explicit assumption about H, implicitly it 
was assumed that H is concentrated near zero. It is this assumption that leads to the 
conclusion that almost every p-valuc over level A must arise from null hypotheses. 
While this is reasonable, it introduces some bias in the analysis because, although 
relatively rare, p-values bigger than A can occur under alternatives as well. 

The density of p-values under alternatives usually has more features than is 
assumed above. These important features may be exploited to construct a more re- 
fined estimator of pFDR. For instance, the density of p-values under an alternative 
value is often decreasing, dropping from an infinite height at to a very low or 
no height at 1, and the derivative of the density approaches zero near the point 1. 
These densities resemble beta (a, b) densities be(x; a, b) = (l/B(a,b))x a ~ 1 (l — x) b ~ 1 
with a < 1 and b > 1, or their mixtures, where B(a, b) = T(a)T(b)/T(a + b) is the 
beta function. A reasonable model may be proposed for this type of densities, and 
based on the model it may be possible to estimate the pFDR function more ac- 
curately. Tang et al. [13] modeled the p-value density under the alternative as a 
mixture of beta densities and thereby incorporated some of the salient features of 
the p-value density directly into the model. They followed a Bayesian approach by 
putting a Dirichlet process prior on the mixing distribution of the beta parameters. 
The resulting posterior is amenable to Markov chain Monte-Carlo methods of com- 
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putation. Tang et al. [13] showed by simulation that the resulting procedure gives 
more stable and accurate estimates of the pFDR function. 

In this paper, we theoretically study the appropriateness of the model assump- 
tions made in Tang et al. [13] and investigate the support of the Dirichlct mixture 
of beta prior. Our results provide important theoretical justification for the setup 
assumed in Tang et al. [13]. Under certain conditions, we show that the posterior 
distribution of the pFDR function is consistent as the number of hypotheses tends 
to infinity. 

2. Mixture model framework 
2.1. Basic setup 

Suppose we have observed the values of the test statistics for testing m null hy- 
potheses Ho.i, i = 1, . . . , m, against appropriate alternatives. Let X\, . . . , X m stand 
for the p-values for the respective m tests. We assume that the tests are based on 
independent data, so that X\, . . . , X m are independent. We also assume that there 
is a random mechanism which independently determines whether Hq^ 's are true or 
false, respectively with probability ir and 1 — n. Let Hi = I(Hq^ is true), be the 
indicator that the ith null hypothesis is true. Of course, H^s are unobserved. 

The distribution of Xi under Hq^ can be assumed to be the uniform distribution 
on [0, 1]. This happens whenever the test statistic is a continuous random variable 
and the null hypothesis is simple, or in situations like the t-tcst or F-tcst, where 
the null hypothesis has been reduced to a simple one by considerations of similarity 
or invariance. Under more general situations, the property can still be expected to 
be approximately true if, for instance, a conditional predictive p-value or a partial 
predictive p-value (Bayarri and Berger [1]) is used; see Robins et al. [9] for details. 
If the null and alternative hypotheses are one-sided and the underlying distribution 
has the monotone likelihood ratio (MLR) property, then the power function is 
increasing in the parameter, and, as a result, the null distribution of the p-value is 
stochastically larger than the uniform. Many estimation procedures remain valid in 
a conservative sense when the actual null distribution is replaced by the uniform. It 
is easy to show that Storey's estimators have this property. The Bayesian estimator 
of Tang et al. [13] also enjoys the same property - see Tang et al. [13] for discussion. 
Henceforth we shall assume that the null distribution of p-values is U[0, 1]. 

Let f(x) stand for the density of the p-value under an alternative distribution. 
The following result shows that under a natural condition, f(x) is decreasing. 

Proposition 1. Suppose that the p-value is computed using a statistic, T , whose 
density, gg, has the MLR property. Then the p-value density f{x) is decreasing. 

Proof. Let 6q stand for the value of the parameter under the null hypothesis and 
9\ stand for the value under the alternative. Let T Q b s stand for the observed value 
of T. Denote the cumulative distribution function (c.d.f.) of gg by Gg. Then the 
distribution function of the p-value under 6\ is 



F 6l {x) = P ei {Pg (T n > T ohs ) <x) = l 



GeAGg^l-x)). 



Hence the p-value density is given by 



(2.1) 



hi (x) 



geAG^jl-x)) 
ge {GgHl-x)) 



9e 1 {z) 
9e (zY 
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where z = G e *(l — x). By the MLR property, the expression in (2.1) is increasing 
in z, equivalently decreasing in x. □ 

For standard two-sided tests like the z-test or i-test, the density of the p- value un- 
der the alternative is also decreasing. Under certain assumptions which are satisfied 
generally, the following result shows a two-sided analog of the previous proposition. 

Proposition 2. Suppose that the p-value is computed using a statistic T whose 
density gg is symmetric under the null hypothesis Hq : 9 = 9q. Further suppose that 
for the symmetrized density ge{z) = (ge{z) + g$(—z))/2, the ratio ge(z)/gg (z) is 
increasing in z. Then the p-value density h(x) is decreasing. 

Proof. With notations as in the last proof, the distribution function of the p-value 
under 6\ is 

F 8l (x) = P ei (2Pe (T n > |T ob8 |) < x) 

= 1 - G fll (G£(l - x/2)) + GeA-Gg^l - x/2)). 

The p-value density can be seen to be given by 

(22) f(x)=f (x) = 9eAG ~ e ° {1 ~ X ' 2)) I SeA-G^jl-xM) ~g 9l (z) 
J9lK 2g eo (G^(l -x/2)) 2g 6o (-G^(l - x/2)) ge (zV 

which is decreasing in x by the given assumption. □ 

The p-value density for a one-sided hypothesis generally decays to zero as x 
tends to 1. Let L stand for the lower limit of the value of the test statistic, which 
is often — oo. Assume that as z — ► L, we have that ge 1 (z)/g$ (z) — > 0. Then, clearly 
it follows from (2.1) that f(x) — ► as x — > 1, since z = Gj^(l — x) — ► L as x — ► 1. 

For a two-sided hypothesis, gg 1 (z) / gg (z) will not generally go to as z — > L, 
and hence the minimum value of the p-value density will be a (small) positive 
number. For instance, for the two-sided normal location model, the minimum value 
is e~ n9 / 2 , where n is the sample size on which the test is based. 



2.2. Identifiability and continuity properties 

If a c.d.f. F on [0,1] can be written as F(x) = irx + (1 — ir)H(x), where H(-) is 
another c.d.f. on [0,1], then the representation is generally not unique, so that ir and 
H are not separately identifiable. The components n and H can be identified by 
imposing the additional condition that H cannot be represented as a mixture with 
another uniform component, which, for the case when H has a continuous density 
h, translates into h(l) = 0. Define the map tt(F) from the space of continuous c.d.f. 
on [0,1] to [0,1] as the maximum possible value of tt in the mixture representation 
F(x) = ttx + (1 — tt)H(x). As in all mixture problems, H is not defined when 
ir(F) is one, that is, F is the uniform distribution on [0,1]. When F physically 
stands for the p-value distribution, n(F) is an upper bound for the proportion of 
null hypothesis and therefore Tr(F) r y/F('y) is an upper bound for the actual pFDR. 
Thus this choice of n is appropriate in a conservative sense in that in order to 
control pFDR, it suffices to control the auxiliary quantity pFDR(F; 7) defined as 
^(F) 7 /F( 7 ). 

Let T stand for all F representable as F(x) = ttx + (1 — ir)H(x) for tt £ [0, 1]. 
The following proposition shows an important upper-scmicontinuity property of the 
map tt(F). Let — > w stand for weak convergence of probability distributions. 




Proposition 3. The class T is weakly closed and the map F <— > tt(F) on T is 
upper- semicontinuous, that is, 

if F n F, then limsup7r(i 7 ' rl ) < ir(F). 



Further, for any 7, limsup,^^ pFDR(F n ; 7) < pFDR(F;7). 

Proof. Let F n £ T and F n —> w F. Because 7r„ = ir(F n ) is a bounded sequence and 
H n in the representation F n (x) = ir n x + (1 — ^n)H n (x) is tight, we may assume 
that both are convergent along a subsequence, to tt* and H* , respectively. Then 
F(x) = n*x + (1 - tt*)H*{x), and hence F e T. 

Observe that for any F € J-, F(X) > ir(F)(l - A) for all < A < 1 and that 
tt(F) = inf{F(A)/(l — A) : < A < 1}. The infimum is attained because, by our 
choice, tt(F) is the largest tt in the mixture representation. 

Now for any fixed Ao which is a continuity point of F, we have that 

r rEM v • f <- r ^"( A o) ^( A o) 
limsupTr^i'nJ = limsupmf — < lim — = — . 

71— >oc n^oo A 1 — A ri^oo 1 — Ao 1 — Ao 

Since Ao is arbitrary and the set of continuity points of F is dense in [0,1], the first 
assertion follows. 

The last relation clearly follows from the expression for pFDR. □ 

Under additional restrictions, identifiability of the components n and H and 
continuity of ir(F) may be established. For example, the following class of c.d.f. 
F allows 7r and H to be identified from F. Assume that the p-value distribution 
H under the alternative belongs to V, the class of c.d.f. on [0,1] which admits a 
density h, with h(l) = 0. Let T-d denote the class of all c.d.f. on [0,1] of the form 
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F(x) = to + (1 - tt)H(x) for tt £ (0, 1) and H £ V. Let f^ h = tt + (1 - ir)h be the 
corresponding mixture density. 

Proposition 4. If f^^h = fir-.h*, then tt = tt* and h = h* . 

Proof, fnji = fn'ji* implies tt + (1 — ir)h(x) — tt* + (1 — TT*)h*(x) for all x. Putting 
x = 1 and using the fact that h(x) = h*(x) = 0, we have tt = tt*. This now implies 
h = h* or H = H*. □ 

To study consistency, we need to show that tt and h can be continuously solved 
from /. However, the class Tx> is not weakly closed. We need to impose a restriction 
on the class of alternative densities so that the tail at 1 remains thin even in the weak 
limit. Let B denote a class of c.d.f. on [0,1] that is weakly closed and for all H £ B we 
have lim^o y _1 -ff (1 — V) = 0- The interval (1 — y, 1] is open in [0, 1]. Hence, by the 
portmanteau theorem, H n — > w H implies that H(l — y) < liminfn^oo H n (l — y). 
Thus for the weak limit H of a sequence H n £ B to be in B, one needs to be 
able to interchange the order of the limits with respect to y and n. For instance, if 
B = {H : H(l — x) < ip(x) for all x < <5}, where 6 > is a fixed number and tp is 
a fixed function which satisfies ip(x) = o(x) as x — > (like Cx 1+e ), then the class 
B satisfies the requirement. Let Tb denote the class of c.d.f. on [0,1] representablc 
as F{x) = ttx + (1 — tt)H(x) for tt £ (0, 1) and H £ B. Note that need not be a 
subset of Tu as the c.d.f. in B need not have a density. 

Proposition 5. Identifiability in Proposition ^ holds if F £ 

Proof. If ttx + (1 - tt)H(x) = tt*x + (1 - tt*)H*(x) for all x, then 

tt(1 - x) + (1 - 7r)i?(ai) = tt*(1 - x) + (1 - 7r*)iT (a;). 

Dividing both sides by 1 — a; and letting a; — ► 1, we obtain tt = tt* and hence 
H = H*. □ 

Proposition 6. The map (tt, H) i— > -FV.ff is a homeomorphism from (0, 1) x B to 
^B, where B and J-'b ore the weak topology. 

Proof. (Forward side) If TT n — > tt and _ff„ — * w H, then H n (x) — > -ff(x) at all 
continuity points x, giving 7r„a; + (1 — TT n )H n (x) — > 7ra; + (1 — tt)H(x). 

(Reverse side) Let F 7Tni H n — > w i^Tr^- To show that 7r„ — > 7r and i? n — Fix 
any subsequence n'. It is enough to extract a further subsequence n" along which 
TT n " — > tt and i/„// — » w iJ. 

Because 7r„' is bounded and is tight, we can extract a further subsequence 
n" such that 7r n '/ — > tt* and -£/„» — >- w iJ* for some tt* and -ff*. By the closedness of 
B under the weak topology, H* £ B (note that (1 — x, 1] is an open subset of [0, 1]). 
By the forward side, F„ „ t H „ —>w F„* t H*- Thus F^ ^h* = F v ^h- By identifiability 
in the class J~b, tt* = tt and H* = H, and hence TT n n — > 7r and £T n // — > w H. This 
completes the proof. □ 



2.3. Mixtures of beta densities 

The shape of p-value densities under alternatives has similarities with the beta 
density hc{x; a,b) = (1/ B(a,b))x a - 1 (l - xf' 1 , < x < 1, for a < 1 and b > 1. 
Indeed, for the exponential model Ae _Az , z > 0, with parameter A and hypotheses 
Hq : A = Ao against H : A > Ao, it follows from elementary calculations that the 
p-value density is exactly beta(a, 1) for some a < 1. Mixtures of beta (a, b) with 
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a < 1 and b > 1 make up a considerably large class still preserving the shape of the 
p- value density, and hence can be considered as a model for p- value densities under 
the alternative. The following result shows that many similar-shaped densities can 
be pointwise represented as a mixture of beta (a, 1), a much narrower class. 

Recall that a function ^ on [0, oo] is called completely monotone if it has deriva- 
tives <p( n > of all orders and (— l) n ip( n ) (z) > for all z > and n = 1,2, 

Proposition 7. // a density h(x) on (0,1) c.d.f. H can be represented as 

h(x) = Jp 1 ax a ~ 1 dG(a) for all < x < 1, i/ien H(e~ y ) is a completely monotone 
function of y on [0, oo). 

Conversely, if h(x) is decreasing and H(e~ v ) is completely monotone, then 
h(x) = Jq 00 ax a ~ 1 dG(a) for some probability measure G on (0, oo) with J a 2 dG(a) < 
JadG(a). 

Proof. If h(x) is a mixture of be(a, 1), we have that 

H(x) = / x a dG(a) = / e ~ alosx '^(a). 
Jo Jo 

Thus H(x) is the Laplace transform of G at the point logx -1 . Put y = logx -1 so 
that x = er v and H(e~ v ) — J Q e~ ay dG(a), the Laplace transform of the probability 
measure G. Hence it is completely monotone by Theorem 1 of Section XIII. 4 of 
Feller (1971). 

To prove the converse, applying the same theorem and using the fact that H(l) = 
1, we obtain the representation that H(e~ v ) = / °° e~ ay dG{a) for some probability 
measure G on (0, oo). Thus H{x) = J °° x a dG(a), and so h(x) = J °° ax a ~ 1 dG{a). 
Now, as h is decreasing, > h'(x) = J a(a — l)x a ~ 2 dG(a). The result now follows 
by letting x — ► 1. □ 

Observe that J a 2 dG(a) < J adG(a) holds if G is concentrated on (0, 1], but it 
is not necessary. 

Remark 1. By a similar argument, if a density h(x) on (0, 1) can be represented 
as h(x) = 6(1 - xf^dGQj) for all < x < 1, then the function 5(1 - e~ y ) is 
completely monotone as a function of y, where H(x) = 1 — H(x). 

Conversely, if H(l — e~ y ) is completely monotone in y and h(x) is decreasing, 
then h(x) = / °°6(1 - x) 6 - 1 dG(6) for some probability measure G on (0, oo) with 
JbdG(b) <Jb 2 dG{b). 

Proposition 8. Let Ti.i stand for the class of decreasing densities h such that 
H(e~ v ) is completely monotone and TL2 stand for the class of decreasing densities 
h such that H(l — e~ v ) is completely monotone. A density h(x) on (0, 1) can be 
represented as a mixture o/bc(a, b) if h(x) is a convex combination of densities of 
the form chx(x)fi2(x) where hi £ Tii and hi S 7^2- 

Proof. Clearly it suffices to assume that h(x) = ch±(x)h2(x), where h\(x) = 
J °° ax^ddia), h 2 (x) = J oo 6(l-a-) b - 1 dG 2 (6) and 

cr 1 = / a6S(a,6)dGi(a)dG 2 (6). 
Jo Jo 

Now defining dG(a,b) = cabB(a,b)dG\(a)dG2(b), we may write h(x) = 
/ be(a;; a, b)dG(a, b). The total mass of G is given by 

/>oo />OC 

/ / cabB(a,b)dG 1 (a)dG 2 (b) = cc- 1 = 1, 
Jo Jo 
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so that G is also a probability measure. This completes the proof. □ 
2-4- Dirichlet mixture prior 

Tang et al. [13] proposed a Dirichlet process prior (Ferguson [5]) for the mixing 
distribution G. The parameters of a Dirichlet process DP(Go,t) are the center 
measure Go = E(G), and the precision parameter r > 0. The center measure Go 
is the subjective guess about G, while r controls the concentration of DP(Go,t) 
around Go- 

The equivalent hierarchical representation in terms of latent variable (aj,6j), 

Xi\a,i,bi ~ 7r + (1 - 7r)bc(x,:|a J :, h), 

(a i; G, 
G~DP(G ,t), 

is extremely useful in developing the relevant MCMC algorithms for the computa- 
tion of posterior. Tang et al. [13] used the reparameterization a = exp(— \L a \) and 
b = exp(|Lb|), and specified Go(a,b) = N(L a \0, a^)N(Lf,\0, a 2 ). Actually, any base 
measure with full support on (0, 1) x (l,oo) will lead to a Dirichlet process with 
large support. 

3. Asymptotic properties of posterior 

Consider a prior II for H and independently a prior n for tt with full support on 
[0, 1]. Let the true value of tt and h be, respectively, n and h where < ttq < 1. 

Theorem 1 (General consistency). If ho belongs to the L\-support o/II in the sense 
thatU(\\h— ho\\i < e) > for alle > 0, then for every e > 0, Pr(sup{\F (x)—F (x)\ : 
< x < 1} < e\X ± , . . .,X m ) -> 1 a.s. 

Proof. For any sequence F n such that F n (x) — > Fq(x) for all x, continuity of Fo and 
Polya's theorem imply that sup^ \F n (x) — Fo(x)\ — ► 0. Thus given e > 0, we can find 
a weak neighborhood W of Fq such that F £ W implies sup^, \F(x) — ^0(^)1 < e - 
Thus it sufhccs to prove that for any weak neighborhood W of Fq, 

Pr{|7r — 7To | < e, -F G W|A l7 . . . , X m } — > 1 a.s. as m — > oo. 

By Schwartz's theorem for weak consistency (see Theorem 4.4.2 of Ghosh and 
Ramamoorthi [8]), it suffices to show that for every e > 0, 

(a x P) {(tt, h) : J U oM log ^ < e| > 0. 

Now f Xth > 7r, so U Q ,h /U,h < 7r _1 /7ro,hoj which is integrable, and the integral 
7T _1 is bounded by a constant when 7r lies in a neighborhood of ttq. So by Lemma 
7 of Ghosal and van der Vaart [7] or Theorem 5 of Wong and Shen [15] 

/ f VO ,ho lo S -T 2 ^ < Ad 2 H {U oM J-K,h) log+ 2 - 1 r, 

where dn stands for the Hcllingcr distance. Also, as d 2 H (f 1 g) < \\f — g\\i, it suffices 
to show that Li-neighborhoods of /jro./to S e ts positive probabilities under ju x II. 
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Now, 



/ (1 -ir)h{x)} - [tt + (1 - n )h {x)]\dx 



Jo 




+(1 — 7To) / \h(x) - h (x)\dx 
Jo 



< 2|tt — 7T | + \\h-ho\\i. 



Since p, gives positive probabilities to neighborhoods of ttq and II gives positive 
probabilities to ii-ncighborhoods of ho, the condition of prior positivity holds. □ 

In view of Proposition 3, the following "upper semi-consistency" (a form of a 
one-sided consistency) may be concluded. 

Corollary 1. Under the conditions of Theorem 1, we have that for any e > 0, 
Pr(7r < 7To + e\X\, . . . , X n ) — > 1 a.s. and that the posterior mean % m satisfies 
lim sup m — >OQ 7r m < 7T a.s. 

Unfortunately, the above corollary has limited significance since typically one 
would not like to underestimate the true it (and the pFDR) while overestimation is 
less serious. In order to ensure that the convergence takes place, we need to enforce 
additional restriction on the support of the prior to ensure continuity of n(F) with 
respect to the weak topology on the restricted space. 

Corollary 2. Assume that II is supported in B C\T> and that ho belongs to the 
L\-support of II. Then for any e > 0, Pr(|7r — 7To| < e\X\, . . . , X n ) — > 1 a.s. and 
that 7r m — ► 7Tq a.s. 

Further, for any < a < 1 and e > 0, 



and the above convergence is uniform for a lying in compact subsets of (0, 1]. 

Proof. The proof of the first assertion follows from Theorem 1 and Proposition 6. 

The second assertion follows from the first because 7r„ — > ttq and F n (a) — > Fo(a) 
implies that TT n a/F n (a) — ■> itoa/ Fo(a), whenever < Fo(a) < 1, and this holds 
whenever < a < 1. In fact, the convergence is uniform over compact subsets of 
(0, 1], because F (a) remains uniformly bounded below there. □ 

Now we consider a concrete prior obtained from a Dirichlet mixture of betas: Let 
h(x) = J be(x; a, b)dG(a, b), where G ~ DP(t, Go) and Go is a probability measure 
on (0, 1) x (1 + e, oo ) with full support. The lower bound b > 1 + e ensures that 



since be(a, b) is stochastically dominated by be(l, b) (by the MLR property of beta 
distribution) and taking mixtures preserves bounds for the probability of a given 
set. This ensures that any H in the support of the prior lies in B. This leads to the 
following consistency result for a Dirichlet mixture of beta prior. 
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Theorem 2 (Full Li-support of beta mixture prior). For any true ho S B n T> 
lying in the L\-closure of the above beta mixtures, consistency of pFDR holds for the 
Dirichlet mixture of beta prior if the center measure Go has support [0, 1] x [1+e, oo) . 

Proof. First let ho{x) = /iq (x) = / be(x; a, b)dQo(a, b). Given e > 0, find r\ > 
and M < oo such that Qo{a < r\ or b > M} < e. Let Qq be Qo restricted and 
rc-normalizcd to [77, 1] x [1, M]. Then by Lemma A. 3 of Ghosal and van der Vaart 
(2001), it follows that |/ig — /iqj ||i < 2e. Thus it suffices to assume that Q is 
supported over [n, 1] x + M] for some r\ > and M < oo. Now if Q n is a sequence 
converging weakly to Qo, wc may also assume that Q n {a < rj or b > M} < e for 
all n and so that \\h,Q n — /iq* ||i < 2e and Q* n converges weakly to Qo- For any 
< x < 1, the beta kernel is a bounded continuous function on [n, 1) x (1, M], and 
hence Hq* (x) — > h,Q (x). Scheffe's theorem then implies that ||/iq» — ^q ||i — * 0. 
Thus, given any e > 0, if Q lies in a sufficiently small weak neighborhood of Qo, 
then \\Jiq — h,Q \\i < e. As the center measure Go has support [0,1] x [1 + e, oo], 
the corresponding Dirichlet process has full weak support. Thus ho belongs to the 
ii-support of the prior, and hence consistency holds by Corollary 2. 

Now more generally, if ho can be approximated by beta mixtures in the Li-sense, 
then also ho lies in the Li-support as the support is a closed set. Hence consistency 
is obtained. □ 

Remark 2. Proposition 8 gives a sufficient condition for ho to be in the Li-closure 
of beta mixtures. 

Remark 3. By Fubini's theorem, the result continues to hold even if r is given a 
prior and Go contains hyperparameters. 

4. Conclusion 

A mixture of beta densities be(a, b) with a < 1 and b > 1 forms a rich class of 
densities with shapes like a reflected J. It is shown that, under various natural sce- 
narios, such densities are appropriate for modeling the density of p-values arising 
from alternative hypotheses. We have also shown that if for any c.d.f. H , H{e~ y ) 
is a completely monotone function of y, then the corresponding density H is rcprc- 
scntablc exactly as a mixture of the above mentioned beta densities. The mixture 
model is especially useful for Baycsian inference, where priors can be induced upon 
the mixture densities through a Dirichlet process prior on the mixing distribution. 
When hypotheses are randomly assigned as null or alternative with a specific prob- 
ability, then the p-value distribution is a mixture of a uniform component and a 
mixture of beta densities of the type mentioned above. By applying the general the- 
ory of posterior consistency for density estimation, we have shown that the posterior 
distribution for estimating the density of p-values is consistent at the true density 
if it is of the given form and the prior on the mixing distribution has every distribu- 
tion in its weak support. Under some further conditions which essentially separate 
mixtures of beta densities from the uniform, it follows that posterior consistency for 
density estimation leads to consistency in estimating positive false discovery rates 
for multiple hypotheses testing. This property gives asymptotic justification of a 
recently proposed Baycsian method of estimating positive false discovery rates by 
the same set of authors. 
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